Transliterated arabic name search
نویسندگان
چکیده
We address name search for transliterated Arabic given names. In previous work, we addressed similar problems with English and Arabic surnames. In each previous case, we used a variant of Soundex and n-grams to improve precision and recall of name matching compared against well known approaches such as the Russell Soundex algorithm. Unlike prior work, the proposed approach does not rely upon Soundex algorithms. We experiment with combinations of n-grams of varying lengths. Our previous work focused on two character n-grams. As with our prior work, this approach uses standard SQL and remains portable to different relational database engines, demonstrated by implementing test in SQLAnywhere and Teradata environments.
منابع مشابه
Classifying Arab Names Geographically
Different names may be popular in different countries. Hence, person names may give a clue to a person’s country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characte...
متن کاملUsing Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies
Crowdsourcing can be a fast and cost-effective approach to obtaining data for training and evaluating machine learning algorithms. Name matching is the challenging task of identifying which names refer to the same person, which is crucial for effective entity disambiguation and search. While there are a number of name matching technologies available, standardized datasets for evaluating them ar...
متن کاملAn Integrated Approach for Arabic-English Named Entity Translation
Translation of named entities (NEs), such as person names, organization names and location names is crucial for cross lingual information retrieval, machine translation, and many other natural language processing applications. Newly named entities are introduced on daily basis in newswire and this greatly complicates the translation task. Also, while some names can be translated, others must be...
متن کاملTransliterated Word Identification and Application to Query Translation Mining
Query translation mining is a key technique in cross-language information retrieval and machine translation knowledge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification ...
متن کاملEncoding transliteration variation through dimensionality reduction: FIRE Shared Task on Transliterated Search
There exist a large amount of user generated Web content in Roman script for the languages which are written in indigenous scripts for various reasons. In the light of this phenomenon, the search engines face a non-trivial problem of matching queries and documents in transliterated space where transliterated content contain extensive spelling variation. This paper describes our proposed method ...
متن کامل